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PARALLEL SOLUTION OF THE LINEAR ELASTICITY PROBLEM 
WITH APPLICATIONS IN TOPOLOGY OPTIMISATION 

J. TURNER*, M. KOCVARA*, AND D. LOGHIN* 

Abstract. In this paper, we aim to solve the system of equations governing linear elasticity 
in parallel using domain decomposition. Through a non-overlapping decomposition of the domain, 
our approach aims to target the resulting interface problem, allowing for the parallel computation 
of solutions in an efficient manner. As a major application of our work, we apply our results to 
the field of topology optimisation, where typical solvers require repeated solutions of linear elasticity 
problems resulting from the use of a Picard approach. 
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1. Introduction. Consider a solid elastic body occupying an open and con¬ 
nected domain C with Lipschitz boundary 512 = dVlu U dVljsr^ where clamping 
and traction are imposed on 512£> and 512jy respectively. Under the application of 
both body forces /: U — >• and boundary tractions g: 512 at — >• the material is 
subject to deformation so that a given reference point x of the initial undeformed 
material is translated to the vector x' = x -|- m(x) of the deformed material, with u 
denoting the displacement. Through the assumption of linearly elastic material be¬ 
haviour, the governing equations for u correspond to the following mixed boundary 
value problem 


(1.1a) 

(1.1b) 

(1.1c) 

(l.ld) 


Cu ■■= V • cr{u) = f, in 12, 

a{u) = E: e{u), in 12, 

M = 0, on 5121), 

a{u)-h = g, on 512TV, 


In the above, the strain caused as a result of the displacements u is characterised by 
the symmetric linearised strain tensor 


e(u) = {eij(M)}fj=i, 


eij(u) 


1 / dui duj\ 

2 ) 


Additionally, n corresponds to the unit outward pointing normal vector on 512 and 
E ■■= U(x) denotes the fourth order elasticity tensor, describing the elastic stiffness 
of 12 as a result of the load placed upon it. 

We will consider the case where our body consists of one or more isotropic materi¬ 
als (i.e: rotational and directional independence). Equation (ll.lbl) describing Hooke’s 
Law can be written as 


a{u) = 2fie{u) -I- Atr {e{u)) /, 
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where / represents the identity matrix of appropriate size, tr(A) denotes the trace 
of a matrix A and both /t and A correspond to Lame constants defined in the usual 
manner 

E - ly'E 

2(1 + 1.)’ + 

with E > 0 corresponding to Young’s modulus and —l<t^<l/2 the Poisson ratio. 
We now look to apply domain decomposition to the problem (HH). To do this, we 
divide our domain n into N nonoverlapping subdomains fli with local boundaries 
d^i and outer unit normals iii. We denote by P the resulting skeletal interface 
P = U^j^Pi where P^ — d^li\dfl and by I — fi\r the set of interior nodes, with 
=: Ui- Assuming that the restriction of Ui to components of the skeletal interface 
is known, problem (ED is equivalent to the following set of subproblems 


( 1 . 2 ) 


( Cui = fi, 

I Ui = 0, 

I a {ui) ■ n* = g^, 

V — Ai, 


in 111 , 

on dflo n d^i, 
on 90AT n 90i, 
on Pi, 


where i = ... ,N. By writing Ui = + uf‘\ we look to describe an appropriate 

interface operator that will allow problems to be decoupled and thus solved strictly 
on subdomains in parallel. Through the definition of matrix extension operators Eli 
that map interface data to relevant subdomains via u'^'^ = iL^A, the system ED can 
be decoupled into the following 2Y + 1 subproblems 


(1.3a) 


(1.3b) 


(1.3c) 


= fi, 

in flj. 

= 0, 

on dVlD n 9n 

E w = g^, 

on 90 AT n 90 

uf'’ = 0, 

on Pj. 


i') 


N 

E 


(j{HiX) ■ rii = 


N 

^cr(Mf^)-nj, on P. 
2=1 


o' 

II 

in Oj, 

ur = 0, 

on 90 d n 90 

E w = 0, 

on 90AT n 90 

(2) A 

u\ = \i, 

on P^. 


ii 

ii 


The associated weak form to problem (ll.3bl) is referred to as the Steklov-Poincare 
equation, where the so-called Steklov-Poincare pseudo-differential operator S': Ag —>• 
Ag defined in the following manner m 


s(A,r7) 


N r 


(SA,?7) := ^ 


2=1 


{a{Hi\i) ■nj)?7^ds 


N 

=■■ ^(SjAi,?7,), 

i=l 


where X,r] € Ag and A|j, =: Aj, Tyjj, =: rji. The space Ag is chosen to be a suitable 
fractional Sobolev space of index 6 based on the boundary conditions of the problem, 
dependent on the intersection of P with dO. [iiin]. 
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2. Matrix Formulation. Through a finite element discretisation of the weak 
formulation, it can be shown that the discrete formulation to the original problem (ED 
requires the solution of a matrix-vector system m- By distributing nodes based on 
their location within the domain, we can view this system as follows 


( 2 . 1 ) 


where 


( 2 . 2 ) 


Ku = 


(Kjj 



= f, 


N 

Ku ■■= 0 Kiu , u = (u,, ur)^ G 

i=l 


In comparison, the corresponding matrix formulations for each of the discrete weak 
formulations to the problems presented in (11.21) can be written down as 


r = fi, 

(2.3) < S'up = fp — 

[ Kiiu-p = -Kirur, 


with global solution u = 0^ -|- up^ . In the above, the matrix S corre¬ 

sponds to the Schur complement, and so the discretisation of the decoupled problem 
EH) can be viewed as a Schur complement approach to the discretisation of the global 
problem ED- Using ED and (ED, we are able to view p.3ll in terms of 2N + 1 
subproblems in the following way 


(2.4) 


=(ii, i = 

N 

< 5'up = fp — ^ 

i—1 

^ Kuum^j^ = —Ki-rur, i = 1,..., N. 


We therefore look to solve ED by exploiting the potential for parallelisation present 
in (1^ . 

3. Preconditioning. The systems we expect to solve will typically be both 
sparse and large scale, due to the expected fineness of the finite element discretisation 
required in modern design processes, allowing for the computation of resolute solu¬ 
tions. This is of particular importance for domains containing sharp jumps, occurring 
for instance due to predefined fixed or void regions. Therefore, it is appropriate to 
consider iterative solution techniques when solving systems of the form ED- For our 
problem, we will consider GMRES [8] for reasons to be described below. 

In order to avoid the direct construction and application of the Schur complement 
matrix, and to improve the spectral properties of the system matrix, we seek an 
appropriate preconditioner for the system ED- Through the following choice of P, 
we see that 


[Ku Kir\ 

[O S )’ 


KP-^ 


in 0 \ 

KtiKjI /rr/ 
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The minimum polynomial of KP ^ is (A — 1)'^, suggesting that iterative solution 
methods such as GMRES will converge in at most d iterations [5]. Based on this, we 
propose to precondition from the right with an approximation P of P as follows 

KP-^u = i, u = Pu, 

where 

~ ^ fKjj K^r\ p_i ^ /if,-/ 0 \ fin -KjA fin J) \ 

Vo S Vo irrj\0 Irr J \ 0 

with S representing an approximation to the discrete Steklov-Poincare operator. We 
therefore seek a representation of S that is not only practical to invert but can also 
be seen to provide an appropriate preconditioning strategy for the resulting interface 
problem. ^ 

The form of S chosen is based on work in [1], where discrete norm representa¬ 

tions for projections of the interpolation spaces Ag onto suitable finite dimensional 
subspaces are described and analysed. Discrete norms of the form 

(3.1) Hg = M + 6»G [0,1], 

are shown to be equivalent to their continuous counterparts on Ag, where M and L 
denote the mass and Laplacian matrices respectively assembled on the interface T. 
One particular example corresponds to 0 = 1/2, which is shown in [1] to adhere to 
the same coercivity and continuity bounds as the discrete Steklov-Poincare operator, 
leading to mesh independent performance of GMRES. The norms presented in (13.1|) 
can be shown to be spectrally equivalent to 

iig = M{M-^Ly-^. 


Both of the above can be applied component-wise to a system, suggesting an appro¬ 
priate form of S as 

d 

(3.2) Hg=^Hg. 

1 

Erom the above, it is clear that fractional powers of matrices must be determined in 
order to apply the discrete norms. For relatively small problems, this can be achieved 
through direct methods such as a generalised eigenvalue decomposition. However, the 
complexity involved is O (rip) suggesting instead the use of iterative approaches for 
larger problems. In [T], approximations through the use of truncated Lanczos and 
inverse Lanczos algorithms are described, and will also be employed within this work 
through the use of flexible GMRES [7] to account for the changing nature of the 
preconditioner. 

4. Results. We present various results in this section to illustrate our approach 
in practice. It should be noted that while certain examples involve symmetric sys¬ 
tem matrices, our choice of non-symmetric preconditioner suggests GMRES as an 
appropriate Krylov solver. 

The test problem considered involves a cantilever beam over the 2D domain fi = 
(0,2) X (0,1), with downward force f = 0.75 and outward traction g = 1. The 
domain will be clamped on the right hand side through the application of homogeneous 
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g 

Y 


Fig. 1. Original decomposed undeformed layout (left) and deformed layout (right). Location of 
clamping and traction illustrated by cyan and black nodes, respectively. 




Dirichlet conditions on the relevant boundary. An illustration of the deflection as well 
as a pictorial example of a division of the domain (into 16 subdomains) is provided 
in Figure [TJ 

As discussed in the previous section, iterative approaches will be used for the 
application of Hg. For this work, it was found that the inverse Lanczos approach 
delivered the most promising results, largely due to the relatively small number of basis 
vectors required to apply the discrete norms. Table[T]illustrate^ our results for differing 
mesh parameters h and subdomains. The column labelled S = I illustrates results 
for both test problems in the absence of interface preconditioning. By reading this 
column from top to bottom (for each problem), we observe a logarithmic dependence 
on the number of GMRES iterations for increasing mesh parameters. Reading this 
column from left to right also suggests a logarithmic dependence on the number of 
subdomains. ^ 

In comparison, the column labelled S = Hg of the table provides results with the 
interface preconditioner as discussed in (13.21) . Here, it can be seen that the number of 
iterations are independent of the chosen mesh parameter. Whilst there is a logarith¬ 
mic dependence on the number of iterations fen an increasing number of subdomains, 
a direct comparison with the column labelled S = I suggests that our preconditioning 
strategy provides significant savings in the number of iterations required for conver¬ 
gence. ^ 

The final column labelled S = i?oPT illustrates results for selected values of theta 
based on testing. It was found that the recorded values were able to provide improved 
results over the other two columns, suggesting that different values of theta are able 
to provide a closer approximation to the decay of the associated Steklov-Poincare 
operator. 



II 

S = He 

S = i^OPT 

# Domains 

4 

16 

64 

4 

16 

64 

4 

16 

64 

9 

- 

- 

- 

0.5 

0.5 

0.5 

0.5 

0.6 

0.7 

h = 1/32 

28 

47 

68 

12 

18 

27 

12 

17 

22 

1/64 

41 

66 

96 

12 

19 

27 

12 

18 

23 

1/128 

59 

93 

137 

12 

19 

27 

12 

19 

24 


Table 1 

Results for the test problem. Tolerance of GMRES set at 10~®. 


In order to observe the computational benefits of our method, we look to pro¬ 
vide rough estimates in order to gauge how our derived approach will perform in a 
parallel environment. Due to the non-overlapping nature of our approach, all subdo- 
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main solves can be carried out in parallel. As mentioned previously, the main issue 
surrounds the solution to the resulting interface problem. Within each application 
of our preconditioner to this problem, we are required to invert the discrete interface 
Laplacian. This issue is present in the Lanczos process, and also in the subsequent 
generalised eigenvalue decomposition that follows. Due to the structure of this matrix, 
these inversions can lead to a computational bottleneck for an increasing number of 
subdomains, and so we would like to consider an iterative approach to alleviate this 
issue. 

The structure of the involved matrix suggests conjugate gradient as a suitable al¬ 
ternative, coupled with an appropriate preconditioning strategy (PCG). In this work, 
we propose to precondition by using the relevant contributions of L restricted to Tj, 
with the cross points removed to enable construction in parallel. The parallel CPU 
time taken for each GMRES iteration can then be realised by dividing the number of 
PCG iterations multiplied by the CPU time taken to apply the preconditioner by the 
total number of faces involved in the construction of P. By adding this contribution 
to the CPU time taken for one parallel subdomain solve, we calculate the total CPU 
time by multiplying the result to the total number of GMRES iterations required to 
achieve convergence. 

The results for the investigation are displayed in Table [2] where GPU times (in 
seconds) are provided for differing mesh and subdomain sizes. A Linux machine with 
an Intel® Gore"'’'^ i7 GPU 870 @ 2.93 GHz with 8 cores was used to obtain the data. 



S — Hqpt 

ff Domains 

4 

16 

64 

256 

6 

0.5 

0.6 

0.7 

0.75 

h = 1/16 

0.0169 

0.0168 

0.0176 

0.0254 

1/32 

0.0635 

0.0273 

0.0199 

0.0232 

1/64 

0.4455 

0.1238 

0.0384 

0.0238 

1/128 

3.8716 

1.0804 

0.2529 

0.0623 

1/256 

50.2476 

13.2858 

3.3204 

0.7295 


Table 2 

Total CPU times (seconds) anticipated through the use of parallel computing. 


From the table, it can be seen that for relatively coarse meshes, we do not see 
a significant enough decrease in the GPU time to warrant the use of parallelism. 
This behaviour can be attributed to the computational complexity of sparse matrix 
inversion {0{k‘^n), k is the bandwidth) for relatively small values of n, and also 
the efficiency of the backslash command in MATLAB. However, notable savings in 
CPU time equating to roughly factor 4 can be seen for finer meshes. These figures 
are encouraging, as they suggest that our approach is capable of significant speedup 
through the use of parallel architecture when compared directly to solving the problem 
globally on a single processor. 

After collating the results in Table [2l a general increase was noted in the number 
of GMRES iterations when compared directly to the figures obtained in Table [T] The 
reason for this can be attributed to the use of inner PCG iterations. In particular, 
a logarithmic dependence on the mesh parameter was observed for cases involving 
smaller numbers of subdomains. However, the deterioration can be seen as an ac¬ 
ceptable compromise, as the results for larger meshes suggest the use of an increasing 
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number of subdomains for improved performance. It should be noted that the results 
obtained above were done so with a relatively coarse tolerance for PCG of 10“^, as 
well as a reasonably modest number of PCG iterations (typically between 2 and 12) 
at each GMRES iteration for each of the test cases considered. 

It should not be expected that continual speedup can be gained through the use 
of an increasing number of subdomains, as certain factors such as inter-processor 
communication between each of the three steps will begin to play an important role. 
Therefore, in terms of a regular subdivision, this would suggest an optimal decomposi¬ 
tion of the domain based on the mesh parameter, and also possibly other contributing 
factors relating to computer hardware. 

5. Topology Optimisation. As an application of our findings, we will describe 
how our work can be incorporated into commonly used solvers from problems arising 
in topology optimisation. The problem we consider here is the so-called Variable 
Thickness Sheet problem [2j pp. 54 - 57], which can be described mathematically 
using finite elements by the following nonlinear optimisation problem 


(5.1) 


min f^u 

u,p 

subject to: K{p)u. = i 

iGD 

0 < p < Pi <p 


Kip)=J2p^K^), 


i=l 


Vi € H — {1, 2,..., m} . 


In the above, the n—vector of nodal displacement values u = u(p) denotes the 
solution to the elasticity equations, with f G K" representing the corresponding dis¬ 
cretisation of the load linear form. The density p G is subject to upper and 
lower bounds p and p respectively, with the volume of the body being denoted by 
V. Additionally, K{p) represents the finite element stiffness matrix for the elasticity 
equations, with each Ki, i = 1,..., m denoting elemental stiffness matrices. 

By considering the method of Lagrange multipliers, minima to (15.11) are obtained 
through a nonlinear system of equations. The nonlinearities can be dealt with using 
a number of commonly used approaches. For instance, one could consider the use 
of interior point methods [3]. The fairly standard solution technique used by the 
community involves the consideration of fixed point type update schemes for an initial 
guess for the density in the following way 

1. Finite Element Analysis (FEA) solve equations of linear elasticity. 

2. Density update e.g.: OG (see El), MMA (see i)- 

3. Gheck for convergence. If not satisfied, rerun 1 and 2 using updated density. 

It can be expected that a reasonably large number of fixed point iterations are re¬ 
quired to obtain a suitable final design. The bulk of computational effort will be 
concentrated on the Finite Element Analysis step, namely the repeated process of 
obtaining updated displacement variables through the use of the equations of linear 
elasticity [3]. Therefore, we propose to apply our preconditioning strategy as discussed 
in Section [3] to this problem coupled with the fairly straightforward Optimality Crite¬ 
ria (OC) method for the density update. No attempt will be made here to carry out 
Step 2 above in parallel; however [3] describe an appropriate implementation using 
the Method of Moving Asymptotes (MMA). 
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# Domains 

4 

16 

64 

256 

0 

0.5 

0.6 

0.7 

0.75 

h = 1/16 

10 (10) 

10 (18) 

10 (33) 

10 (56) 

1/32 

17 (11) 

17 (18) 

17 (34) 

19 (54) 

1/64 

23 (11) 

23 (18) 

24 (32) 

27 (54) 

1/128 

29 (12) 

30 (17) 

32 (31) 

32 (52) 

1/256 

33 (13) 

36 (17) 

37 (31) 

41 (49) 


Table 3 

Results for the cantilever beam problem solved using our preconditioning strategy for the FEA 
coupled with the OC method for the density update. 


In Table [3l results are provided illustrating the performance of our approach for 
the cantilever beam problem. The results were obtained using an adaptive tolerance 
for GMRES based on successive compliance values. The total number of fixed point 
iterations are given, along with the average number of GMRES iterations per fixed 
point step (bracketed). Whilst the number of fixed point iterations appears to in¬ 
crease for finer meshes, the average number of GMRES iterations remains roughly 
constant. Whilst we still see a logarithmic dependence on the average number of 
GMRES iterations for an increasing number of subdomains, the fixed point iterations 
remain roughly constant. 

Future work involves validation of our approach on a parallel machine, as well as 
consideration of further problems (possibly to include 3D domains) and alternative 
solution methods to try to solve topology optimisation problems completely in parallel. 
We expect our approach to adapt well in parallel, with potential speedup for 3D 
problems of factor 8 anticipated. 
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